Add deepseek ocr #41797

molbap · 2025-10-22T21:04:03Z

What does this PR do?

As per title. Architecturally: Llava-next used as skeleton with a modified SamModel and a modified ClipVisionModel, keeping the deepseekV2 decoder untouched (using AutoModel) and changing using config only.

Working config + random weights init
Modular draft with subconfigs (two vision configs)
Conversion from original checkpoint done
Modular model finished
Integration tests/OCR tests working as in original codebase
Make modular slimmer
Make processor faster
Complete test suite for transformers
Remap weights to avoid conversion / on-the-fly conversion? (cc @ArthurZucker )

Current branch is functional. You can convert the weights and run the following on your image and you'll get a nice OCR output.

import torch
from PIL import Image

from transformers import DeepseekOcrForConditionalGeneration, DeepseekOcrProcessor
from transformers import model_addition_debugger_context


processor = DeepseekOcrProcessor.from_pretrained("deepseek_ocr_converted")
model = DeepseekOcrForConditionalGeneration.from_pretrained("deepseek_ocr_converted", dtype=torch.bfloat16)

image = Image.open("handwritten_letter_small.png").convert("RGB")

conversation = [
    {
        "role": "<|User|>",
        "content": [
            {"type": "image", "path": "./handwritten_letter_small.png"},
            {"type": "text", "text": "<|grounding|>Convert the document to markdown."},
        ],
    }
]

inputs = processor.apply_chat_template(
    conversation,
    return_dict=True,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
    )

with torch.no_grad():
    generated = model.generate(**inputs, max_new_tokens=50)

text = processor.batch_decode(generated, skip_special_tokens=False)[0]
print(text.strip())

molbap · 2025-10-28T18:25:13Z

Implementation works. Processor remains to be optimized but getting similar results as in original repository.

HuggingFaceDocBuilderDev · 2025-10-28T18:33:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-11-03T20:15:19Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_ocr

molbap added 24 commits October 22, 2025 12:05

hop

e931114

iterate

60a825b

fix

72c640c

fixup

690455e

make things simple

e2182c3

update conversion

20e3f6c

I believe this is not needed

7099e23

imports breathing better

edbbd0a

650 loc modular

20c5e0f

conversion and test running

0031e8f

add modular of course

7c0a2f2

naming

3a148ba

no mla deepseek

1b36afb

up

b63c11a

update

92d13ca

update

cfe15ed

Merge branch 'main' into deepseek_ocr

c820daa

fix 'template'

165014d

tosquash

b5acad8

much better (squash too)

34b41d4

tweak

9a2e47f

ugly moe_infer path but nice generation

a52ec39

nice

cec4fb3

cleaner routing

903cb2a

molbap marked this pull request as ready for review October 28, 2025 18:25

molbap changed the title ~~[WIP] add deepseek ocr~~ Add deepseek ocr Oct 28, 2025

molbap added 2 commits October 30, 2025 14:17

Merge branch 'main' into deepseek_ocr

48fdb92

doc, draft tests

39b2683

molbap added 5 commits October 30, 2025 18:29

fixup

b7ca8e4

improve config

337ec00

some improvements

17c3a3a

tests

d14566e

fixup

f7c8ccc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add deepseek ocr #41797

Add deepseek ocr #41797

molbap commented Oct 22, 2025 •

edited

Loading

Uh oh!

molbap commented Oct 28, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 28, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add deepseek ocr #41797

Are you sure you want to change the base?

Add deepseek ocr #41797

Conversation

molbap commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

molbap commented Oct 28, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 28, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

molbap commented Oct 22, 2025 •

edited

Loading